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Abstract 


The automata arising from the well known conversion of regular expression to non 
deterministic automata have rather particular transition graphs. We refer to them as 
the Glushkov graphs, to honour his nice expression-to-automaton algorithmic short 
cut [S]. The Glushkov graphs have been characterized in terms of simple graph 
theoretical properties and certain reduction rules. We show how to carry, under certain 
restrictions, this characterization over to the weighted Glushkov graphs. With the 
weights in a semiring K, they are defined as the transition Glushkov K-graphs of the 
Weighted Finite Automata (WFA) obtained by the generalized Glushkov construction 
from the K-expressions. It works provided that the semiring K is factorial and the 
K-expressions are in the so called star normal form (SNF) of Brüggeman-Klein [2]. The 
restriction to the factorial semiring ensures to obtain algorithms. The restriction to 
the SNF would not be necessary if every K-expressions were equivalent to some with 
the same litteral length, as it is the case for the boolean semiring B but remains an 
open question for a general K. 


Keywords: Formal languages, weighted automata, K-expressions. 


1 Introduction 


The extension of boolean algorithms (over languages) to multiplicities (over series) has 
always been a central point in theoretical research. First, Schützenberger has given 
an equivalence between rational and recognizable series extending the classical result of 
Kleene [11]. Recent contributions have been done in this area, an overview of knowledge 
of these domains is presented by Sakarovitch in [14]. Many research works have focused 
on producing a small WFA. For example, Caron and Flouret have extended the Glushkov 


construction to WFAs [4]. Champarnaud et al have designed a quadratic algorithm [?] for 
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computing the equation WFA of a K-expression. This equation WFA has been introduced 
by Lombardy and Sakarovitch as an extension of Antimirov’s algorithm [12] based on 
partial derivatives. 

Moreover, the Glushkov WFA of a K-expression with n occurrences of symbol (we say 
that its alphabetic width is equal to n) has only n 4- 1 states; the equation K-automaton 
(that is a quotient of the Glushkov automaton) has at most n 4- 1 states. 

On the opposite, classical algorithms compute K-expressions the size of which is expo- 
nential with respect to the number of states of the WFA. For example, let us cite the block 
decomposition algorithm proven in [1]. 

In this paper, we also address the problem of computing short K-expressions, and we 
focus on a specific kind of conversion based on Glushkov automata. Actually the par- 
ticularity of Glushkov automata is the following: any regular expression of width n can 
be turned into its Glushkov (n + 1)-state automaton; if a (n + 1)-state automaton is a 
Glushkov one, then it can be turned into an expression of width n. The latter property 
is based on the characterization of the family of Glushkov automata in terms of graph 
properties presented in [5]. These properties are stability, transversality and reducibility. 
Brüggemann-Klein defines regular expressions in Star Normal Form (SNF) [2]. These ex- 
pressions are characterized by underlying Glushkov automata where each edge is generated 
exactly one time. This definition is extended to multiplicities. The study of the SNF case 
would not be necessary if all K-expressions were equivalent to some in SNF with the same 
litteral length, as it is the case for the boolean semiring 

'The aim of this paper is to extend the characterization of Glushkov automata to the 
multiplicity case in order to compute a K-expression of width n from a (n+ 1)-state WFA. 
This extension requires to restrict the work to factorial semirings as well as Star Normal 
Form K-expressions. 

We exhibit a procedure that, given a WFA M on K a factorial semiring, outputs the 
following: either M is obtained by the Glushkov algorithm from a proper K-expression E 
in Star Normal Form and the procedure computes a K-expression F equivalent to E, or 
M is not obtained in that way and the procedure says no. 

The following section recalls fundamental notions concerning automata, expressions 
and Glushkov conversion for both boolean and multiplicity cases. An error in the paper by 
Caron and Ziadi |5| is pointed out and corrected. The section 3 is devoted to the reduction 
rules for acyclic K-graphs. Their efficiency is provided by the confluence of K-rules. The 
next section gives orbit properties for Glushkov K-graphs. The section 5 presents the 
algorithms computing a IK-expression from a Glushkov K-graph and details an example. 


2 Definitions 


2.1 Classical notions 


Let X be a finite set of letters (alphabet), e the empty word and Ø the empty set. Let (K, 
®, &) be a zero-divisor free semiring where 0 is the neutral element of (K,&) and 1 the 
one of (K, &). The semiring K is said to be zero-divisor free [9] if 0 4 1 and if Vr, y € K, 
z9y-0-2r-Üory-O0. 

A formal series [i] is a mapping S from X* into K usually denoted by S = `> S(w)w 


wen* 
where S(w) € K is the coefficient of w in S. The support of S is the language Supp(S) = 


{w € X*|S(w) z 0). 

In [12], Lombardy and Sakarovitch explain in details the computation of K- expressions. 
We have followed their model of grammar. Our constant symbols are € the empty word 
and Ø. Binary rational operations are still + and -, the unary ones are Kleene closure 
*, positive closure * and for every k € K, the multiplication to the left or to the right 
of an expression x. For an easier reading, we will write KE (respectively Ek) for k x E 
(respectively E x k). Notice that our definition of K-expressions, which set is denoted Ex, 
introduces the operator of positive closure. This operator preserves rationality with the 
same conditions (see below) that the Kleene closure's one. 

K-expressions are then given by the following grammar: 


E—acX|6|e|(E- E) | (EE) | (E) | CE*) |(KE,keK| (Ek) ke K 


Notice that parenthesis will be omitted when not necessary. The expressions E^ and 
E* are called closure expressions. If a series S is represented by a K-expression E, then 
we denote by c(S) (or c(E)) the coefficient of the empty word of S. A K-expression E is 


Too 
valid if for each closure subexpression F* and F+ of E, b» c(F) €K. 
i=0 


A K-expression E is proper if for each closure subexpression F* and F* of E, c(F) = 0. 

We denote by €x the set of proper K-expressions. Rational series can then be defined 
as formal series expressed by proper K-expressions. For E in Ex, Supp(E) is the support 
of the rational series defined by E. 

The length of a K-expression E, denoted by || E||, is the number of occurences of letters 
and of e appearing in E. By opposition, the litteral length, denoted by |E| is the number 
of occurences of letters in E. For example, the expression E = (a + 3)(b+ 2) + (—1) asa 
length of 5 and a litteral length of 2. 

A weighted finite automaton (WFA) on a zero-divisor free semiring K over an alphabet 
X [6] is a 5-tuple (X, Q, I, F,ó) where Q is a finite set of states and the sets 7, F and 6 are 
mappings I : Q > K (input weights), F : Q > K (output weights), and ô : Q x E x Q > K 
(transition weights). The set of WFAs on K is denoted by Mg. A WFA is homogeneous if 
all vertices reaching a same state are labeled by the same letter. 


A K-graph is a graph G = (X,U) labeled with coefficients in K where X is the set of 
vertices and U : X x X — K is the function that associates each edge with its label in 
K. When there is no edge from p to q, we have U(p,q) = 0. In case K = B, the boolean 
semiring, Eg is the set of regular expressions and, as the only element of K \ 0 is 1, we 
omit the use of coefficient and of the external product (1a — a1 — a). For a rational series 
S represented by E € Es, Supp(E) is usually called the language of E, denoted by L(E) 
and S — Supp(S) — L(E). A boolean automaton (automaton in the sequel) M over an 
alphabet X is usually defined [6] as a 5-tuple (€, Q, I, F, 6) where Q is a finite set of 
states, J C Q the set of initial states, F C Q the set of final states, and 6 CQ x X x Q the 
set of edges. We denote by L(M) the language recognized by the automaton M. A graph 
G — (X,U) is a B-graph for which labels of edges are not written. 


2.2 Extended Glushkov construction 


An algorithm given by Glushkov for computing an automaton with n + 1 states from 
a regular expression of litteral length n has been extended to semirings IK by the authors 
[4]. Informally, the principle is to associate exactly one state in the computed automaton 
to each occurrence of letters in the expression. Then, we link by a transition two states of 
the automaton if the two occurences of the corresponding letters in the expression can be 
read successively. 

In order to recall the extended Glushkov construction, we have to first define the ordered 
pairs and the supported operations. An ordered pair (l,i) consists of a coefficient | € K\{0} 
and a position 7 € N. We also define the functions Zy : H — K such that Ty(i) is 
equal to 1 if i € H and 0 otherwise. We define P : Qk\{0}xN _, 2N the function that 
extracts positions from a set of ordered pairs as follows: for Y a set of ordered pairs, 
P(Y) = {4,1 <j < Y| | AG, i) EY} 

The function Coeff, : P(Y) —^ K \ {0} extracts the coefficient associated to a position 
i as follows: Coeffy (i) =l for (l,i) € Y. 

Let Y, Z C KV {0} x N be two sets of ordered pairs. We define the product of k € K\0 
and Y by kY = {(k @1,1) | (i) € Y) and Y-k = ((L& k,i) | (i) €Y), 0-Y —- Y-0- 0. 
We define the operation W by Y w Z = ((l,4) | either (l,i) € Y andi ¢ P(Z) or (l,i) € 
Z and i € P(Y) or (l,,i) € Y, (lj, i) € Z for some ls, lų € K with l = ls 6l; £ 0). 

As in the original Glushkov construction [7] [13], and in order to specify their position 
in the expression, letters are subscripted following the order of reading. The resulting ex- 
pression is denoted E, defined over the alphabet of indexed symbols X, each one appearing 
at most once in E. The set of indices thus obtained is called positions and denoted by 
Pos(E). For example, starting from E = (2a4-5)*- a- 3b, one obtains the indexed expression 
E = (2a, + b3)*- a3: 364, © = {a1, b2,a3, b4} and Pos(E) = {1,2,3,4}. Four functions are 
defined in order to compute a WFA which needs not be deterministic. First(E) represents 


the set of initial positions of words of Supp(F) associated with their input weight, Last(E) 


represents the set of final positions of words of Supp(E) associated to their output weight 


and Follow(E,i) is the set of positions of words of Supp(E) which immediately follows 
position i in the expression E, associated to their transition weight. In the boolean case, 
these sets are subsets of Pos(E). The Null(E) set represents the coefficient of the empty 
word. The way to compute these sets is completely formalized in table [I] 
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Table 1: Extended Glushkov functions 
These functions allow us to define the WFA M = (=, Q, (sr, F,d) where 
1. ¥ is the indexed alphabet, 


2. sy is the single initial state with no incoming edge with 1 as input weight, 


3. Q = Pos(E)U {sz} 


4. F:Q— K such that F(i) { Null(E) if 7 = sy 


Coeffrast(E) (i) otherwise 
5. 6:Qx Xx Q >K such that 6(i,a;,h) =0 for every h 4 j, whereas 
TN Coeffrirsm (i) t= 51 
ôli Qj; = : : 
ted) { Coeffroiow a) 7 si 


The Glushkov WFA M = (X,Q, (sr), F,ô) of E is computed from M by replacing the 
indexed letters on edges by the corresponding letters in the expression Æ. We will denote 


Ax : Ex — Mx the application such that Ax(F) is the Glushkov WFA obtained from E 
by this algorithm proved in [4]. 

In order to compute a K-graph from an homogeneous WFA M, we have to add a new 
vertex {®}. Then U, the set of edges, is obtained from transitions of M by removing labels 
and adding directed edges from every final state to {®}. We label edges to ® with output 
weights of final states. The labels of the edges U(i,p) for i € Q, I(i) Z0, pe QU (9) are 
®-multiplied by the input value of the initial state i of M. 

In case M is a Glushkov WFA of a K-expression E, the K-graph obtained from M is 
called Glushkov K-graph of E and is denoted by Gk (E). 


2.3 Normal forms and casting operation 


Star normal form and epsilon normal form 


For the boolean case, Brüggemann-Klein defines regular expressions in Star Normal Form 
(SNF) [2] as expressions Æ for which, for each position i of Pos(E), when computing the 
Follow(E,1) function, the unions of sets are disjoint. This definition is given only for usual 
operators ,+, -, x. We can extend this definition to the positive closure, * as follows: 


Definition 1 A B-ezpression E is in SNF if, for each closure B-subexpression H* or H*, 
the SNF conditions (1) Follow(H, Last(H)) N First(H) = 0 and (2) e € L(H) hold. 


Then, the properties of the star normal form (defined with the positive closure) are pre- 
served. 

In the same paper, Brüggemann-Klein defines also the epsilon normal form for the 
boolean case. We extend this epsilon normal form to the positive closure operator. 


Definition 2 The epsilon normal form for a B-expression E is defined by induction in the 
following way: 


e [E =£ or E =a] E is in epsilon normal form. 


e [E = F - G| E is in epsilon normal form if F and G are in epsilon normal form and 
if e e L(F) A L(G). 


e [E — FG] E is in epsilon normal form if F and G are in epsilon normal form. 


e [E = F+ or E = F*| E is in epsilon normal form if F is in epsilon normal form and 


e e L(F). 


Theorem 3 ([2]) For each regular expression E, there exists a regular expression E* such 
that 


1. Ap(E) = Ag(E*), 


2. E* is in SNF 
3. E* can be computed from E in linear time. 


Brüggemann-Klein has given every step for the computation of E*. This computation re- 
mains. We just have to add for H* the same rules as for H*. Main steps of the proof are 
similar. 


We extend the star normal form to multiplicities in this way. Let E be a K-expression. 
For every subexpression H* or H* in E, for each x in P(Last(H)), 


P(Follow(H,x)) n P(First(H)) = 0 


We do not have to consider the case of the empty word because H* and H* are proper 
K-expressions if c(H) — 0. 

As an example, let H = 2af +(3b2)+ and E = (H)*. We can see that the expression E = 
(2ay +(3b2)*)* is not in SNF, because 2 € P(Last(H)), 2 € P(Follow(H,2))NP(First(H)). 


The casting operation ~ 


We have to define the casting ~: Mk — Mg. This is similar to the way in which Buchs- 
baum et al. [3] d define the topology of a graph. A WFA M = (X, Q, I,F,ó) is casted into 
an automaton M =(%,Q,1,F 5) in the following way: I,F C Q, T= {q € Q | I(q) £0}, 

= {q € Q | F(q) # D} and 6 = {(p,a,q) | p,q € Q, a € E and ô((p,a,q)) # 0}. The 
a operation can be extended to K-expressions ~: Ex — Eg. The regular expression Æ 
is obtained from E by replacing each k € K\0 by 1. The ~ operation on E is an embedding 
of K-expressions into regular ones. Nevertheless, the Glushkov B-graph computed from a 
IK-expression E may be different whether the Glushkov construction is applied first or the 
casting operation ~. This is due to properties of K-expressions. For example, let K = Q, 
E = 2a* + (-2)b* (E is not in epsilon normal form). We then have E = a* + b*. We can 
notice that Ak(E) Z Ag(E) (E does not recognize e but E does). 


Lemma 4 Let E be a K-exzpression. If E is in SNF and in epsilon normal form, then 
Ak(E) = Ag(E). 


Proof We have to show that the automaton obtained by the Glushkov construction for 
an expression E in Ex has the same edges as the Glushkov automaton for E. First, we 
have Pos(E) — Pos(E), as E is obtained from E only by deleting coefficients. Let us 
show that First(E) = P(First(E)) (states reached from the initial state) by induction 
on the length of E. If E = e, E = e, First(E E) = 0 = First(E) = P(First(E)). If 
E =a € X, E = a, then E = E, First(E) = {(1,1)}, P(First(E)) = (1) = First(E). 


Let F satisfy the hypothesis, and E = kF,k € K \ 0. In this case, E- F, P(First(E)) — 
P(k.First(F)) = P(First(F)) = First(F) = First(E). If E = Fk, k € K, E = F, 
P(First(E)) = P(First(F)) = First(F) = First(E). 

If E = F + H, and if F and H satisfy the induction hypothesis, and as the coefficient 
of the empty word is 0 for one of the two subexpression F or H (epsilon normal form), we 
have E = F + H, First(F + H) = First(F) U First(H) = P(First(F)) U P(First(H)) 
which is equal to P(First(F -- H)) by induction. We obtain the same result concerning 
F.H, F* and F*. 

The equality Last(E) = P(Last(E)) is obtained similarly. 

The last function used to compute the Glushkov automaton is the Follow function. 
Let E be a K-expression and i € Pos(E). If E = e, E = e, Follow(E,i) = 0 = 
Follow(E,i) = P(Follow(E,i). If E =a € K, E = E, Follow(E,i) = 0. Let F sat- 
isfy Follow(F,i) = P(Follow(F,i)) for all i € Pos(F). If E is kF or Fk, k € K\0, 
P(Follow(E,i) — P(Follow(F,i) — Follow(F,i) by hypothesis. If F and H satisfy 
the induction hypothesis, and if E = F + H, (and i € Pos(F) without loss of general- 
ity), Follow(F + H,i) = Follow(F,i), then P(Follow(F,i) = Follow(F,i). We obtain 
similar results for E = F.H as there is no intersection between positions of F and H. 
Concerning the star operation, let E — F*, with Follow(F,i) — P(Follow(F,i)) for all 
i € Pos(F). Then, P(Follow(F*,i)) = P(Follow(F, i) U Coeff asin) (i): First(F)). But by 
definition, as F is in SNF, we know that Follow(F,i)nFirst(F) = 0, so P(Follow(F*,i)) = 
Follow(F*, i). In fact, it means that if there exists a couple (a,j) € Follow(F,i), there 
cannot exist (6,7) € First(F). Otherwise, the expression would not be in SNF, and it 
would be possible that 6 = a, which would make j ¢ Pos(F*) and imply a deletion of an 
edge. A same reasonning can be done for the positive closure operator. 

Hence, the casting operation ~ and the Glushkov construction commute for the com- 
position operation if we do not consider the empty word. 


2.4 Characterization of Glushkov automata in the boolean case 


The aim of the paper by Caron and Ziadi [5] is to know how boolean Glushkov graphs can 
be characterized. We recall here the definitions which allow us to give the main theorem of 
their paper. These notions will be necessary to extend this characterization to Glushkov 
K-graphs. 

A hammock is a graph G = (X,U) without a loop if |X| = 1, otherwise it has two 
distinct vertices 4 and t such that, for any vertex x of X, (1) there exists a path from i to 
t going through z, (2) there is no non-trivial path from t to x nor from z to i. Notice that 
every hammock with at least two vertices has a unique root (the vertex i) and anti-root 
(the vertex t). 

Let G = (X,U) be a hammock. We define O = (Xo,Uo) € G as an orbit of G if and 
only if for all x and 2’ in Xo there exists a non-trivial path from x to x’. The orbit O is 


maximal if, for each vertex x € Xo and for each vertex x’ € X V Xo, there do not exist 
both a path from x to x’ and a path from z' to x. Equivalently, © C G is a maximal orbit 
of G if and only if it is a strongly connected component with at least one edge. 

Informally, in a Glushkov graph obtained from a regular expression E, the set of vertices 
of a maximal orbit corresponds exactly to the set of positions of a closure subexpression 
of E. 

The set of direct successors (respectively direct predecessors) of x € X is denoted by 
Q* (x) (respectively Q^ (z)). Let ne = |Q (x)| and mz = |Q*(x)|. For an orbit O C G, 
O^ (x) denotes Q* (x) N (X V O) and O- (x) denotes the set Q^ (x) N (X V O). In other 
words, O*(x) is the set of vertices which are directly reached from x and which are not 
in O. By extension, OF = Ueo O*(z) and O^ = Upeo O (x). The sets In(O) = (x € 
Xo | O^ (x) Z 0) and Out(O) = (x € Xo | O*(x) F Ø} denote the input and the output 
of the orbit O. As G is a hammock, In(O) 4 Ø and Out(O) 4 Ø. An orbit O is stable if 
Out(O) x In(O) C U. An orbit O is transverse if, for all x,y € Out(O), O*(x) = O*(y) 
and, for all z, y € In(O), O^ (x) = O (y). 

An orbit O is strongly stable (respectively strongly transverse) if it is stable (respectively 
transverse) and if after deleting the edges in Out(O) x In(O) (1) there does not exist any 
suborbit O’ C O or (2) every maximal suborbit of O is strongly stable (respectively strongly 
transverse). The hammock G is stronly stable (respectively strongly transverse) if (1) it 
has no orbit or (2) every maximal orbit O C G is strongly stable (respectively strongly 
transverse). 

If G is strongly stable, then we call the graph without orbit of G, denoted by SO(G), 
the acyclic directed graph obtained by recursively deleting, for every maximal orbit O of 
G, the edges in Out(O) x In(O). The graph SO(G) is then reducible if it can be reduced 
to one vertex by iterated applications of the three following rules: 


e Rule R: If x and y are vertices such that Q^ (y) = {x} and Q(x) = {y}, then 
delete y and define Q* (x) := Q"* (y). 


e Rule Rs: If x and y are vertices such that Q^ (x) = Q^ (y) and Q*(x) = Q*(y), 
then delete y and any edge connected to y. 


e Rule Rs: If x is a vertex such that for all y € Q7 (x), Q(x) C Q*(y), then delete 
edges in Q^ (x) x Q* (x). 


Theorem 5 ([5]) G = (X,U) is a Glushkov graph if and only if the three following con- 
ditions are satisfied: 


e G is a hammock. 
e Lach maximal orbit in G is strongly stable and strongly transverse. 


e The graph without orbit SO(G) is reducible. 


2.5 The problem of reduction rules 
An erroneous statement in the paper by Caron and Ziadi 


In [5], the definition of the R3 rules is wrong in some cases. Indeed, if we consider the 
regular expression E = (xı + €)(a2 + €) + (za + €)(a4 + €), the graph obtained from the 
Glushkov algorithm is as follows 


Let us now try to reduce this graph with the reduction rules as they are defined in 
[5]. We can see that the sequel of applicable rules is Rg, Rg and Ri. We can notice that 
there is a multiple choice for the application of the first H3 rule, but after having chosen 
the vertex on which we will apply this first rule, the sequel of rules leads to a single graph 
(exept with the numerotation of vertices). 


Figure 1: Application of R3 on 1, R3 on 2 and R; on 1 and 2. 


We can see that the graph obtained is no more reducible. This problem is a consequence 
of the multiple computation of the edge (0, *). In fact, this problem is solved when each 
edge of the acyclic Glushkov graph is computed only once. It is the case when E is in 
epsilon normal form. 


A new H3 rule for the boolean case 


Let G = (X,U) be an acyclic graph. The rule Rg is as follows: 


e If x € X is a vertex such that for all y € Q^ (x), Q*(r) C Q*(y), then delete the 
edge (q_,q*) € Q^ (x) x Q* (x) if there does not exist a vertex z € X V {x} such that 
the following conditions are true: 


— there is neither a path from x to z nor a path from z to zx, 
- q €Q (2) and q* € Q* (3), 
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- Qa) | x O° (2)| £1. 


The new rule R3 check whether conditions of the old Rg rules are verified and moreover 
deletes an edge only if it does not correspond to the £ of more than one subexpression. 
The validity of this rule is shown in Proposition 


3 Acyclic Glushkov WFA properties 


The definitions of section [2.4] related to graphs are extended to K-graphs by considering 
that edges labeled 0 do not exist. 

Let us consider M a WFA without orbit. Our aim here is to give conditions on weights 
in order to check whether M is a Glushkov WFA. Relying on the boolean characterization, 
we can deduce that M is homogeneous and that the Glushkov graph of M is reducible. 


3.1 K-rules 


K-rules can be seen as an extension of reduction rules. Each rule is divided into two 
parts: a graphic condition on edges, and a numerical condition (exept for the KR,-rule) 
on coefficients. The following definitions allow us to give numerical constraints for the 
application of K-rules. 

Let G = (X,U) be a K-graph and let x,y € X. Let us now define the set of beginnings 
of the set Q (x) as B(Q (x)) € Q (x). A vertex x is in B(Q (x)) if for all q7 in 
Q7 (x) there is not a non trivial path from q^ to x~. In the same way, we define the set of 
terminations of Q*(r) as T(Q^(x)) C Q*(r). A vertex xt is in T(Q* (z)) if for all q* in 
Q* (x) there is not a non trivial path from xt to q”. 

We say that x and y are backward equivalent if Q^ (x) = Q^ (y) and there exist lz, ly € K 
such that for every q` € Q" (x), there exists ag- € K such that U (q7, x) = o4- & lz and 
U(q^,y) = ag- Sly. Similarly, we say that x and y are forward equivalent if QT (x) = Qt (y) 
and there exist r;,ry € K such that for every q* € Q* (z), there exists B4- € K such that 
U(z,q*) — rz & Bj" and U(y,q*) = ry & B,-. Moreover, if x and y are both backward and 
forward equivalent, then we say that x and y are bidirectionally equivalent. 

In the same way, we say that x is e-equivalent if for all (q~,q*) € Q^ (x) x Q*(x) the 
edge (q^, q^) exists and if there exist k,l, r € K such that for every q7 € Q^ (x) there exists 
ag- € K and for every q* € Q* (x) there exist 6,+ € K, such that U(g ,z) = o4- 8 l, 
U(z,qt) =r @ By and U(q ,q*) = ag- 8 k 8 Byt- 

Similarly, x is quasi-e-equivalent if 


e BQ (z)) FQ (a) or T(Q*(x)) # Q* (z), and 


e for all (q7,q*) € Q^ (x) x Q* (z) \ B(Q^ (z)) x T(Q*(z)), the edge (q7,q*) exists, 
and 
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e there exist k,l,r € K such that for every q7 € Q^" (x) there exist a,- € K and for 


every qt € Q^ (x), there exist 6,+ € K such that U(q7, z) = ag- @1, U(zx,q*) = 
r G B,«, and 


e if q7 Z B(Q (x)) or gt € T(Q* (z)) 


— then U(qg ,q^) = o4- & k & Bg 

— else there exists y € K such that U(q^,q*) = y ® ag- & k & By (Notice that 
if the edge from q^ to qt does not exist in the automaton, then U(q~,q*) =0 
and it is possible to have y 6 aq- & k & By. = 0). 


In order to clarify our purpose, we have distinguished the case where (q^, q^) are superposi- 
tions of edges (quasi-e-equivalence of x) to the case where they are not (e-equivalence of x). 


Rule KR: If x and y are vertices such that Q^ (y) = {x} and Q* (x) = {y}, then delete 
y and define Qt (x) + Q" (y). 


O-@ 4. © 


Figure 2: KR, reduction rule 


Rule K R3: If x and y are bidirectionally equivalent, with lz, ly, rz, ry € K are the constants 
satisfying such a definition, then 


e delete y and any edge connected to y 


e for every q7 € Q (z) and gt € Q*(z) set U'(q ,z) = ag- and U'(z,q*^) = Bj. 


where ag- and 8,. are defined as in the bidirectional equivalence. 


a, &l 


q- FOmM KR2 
9 gy rySBg. 


Figure 3: KR» reduction rule 


Rule KRa: If x is e-equivalent or x is quasi-e-equivalent with l, r,k,y € K the constants 
satisfying such a definition, then 
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e if x is c-equivalent 


— then delete every (q~,q*) € Q^ (x) x Q* (x), 
— else delete every (q~,qt) € Q^ (x) x Q* (zx) \ B(Q-(x)) x T(Q* (z)). 


e for every q` € Q (x) and gt € Q*(z) set U'(q ,z) = ag- and U'(z,q*) = by 
where ag- and B4« are defined as in the e-equivalence or quasi-e-equivalence. 


e If x is quasi-e-equivalent then compute the new edges from B(Q- (x)) x T(Q*(z)) 
labeled y. 


aq. &l UN KR3 
T Ses «Os 


Figure 4: KR reduction when z is e-equivalent 


iS) r@B 

al q+ ág: X) Bot 
CO NNI 
GKk&,., BY 

B(Q'99) Ta ot T(0*69) Y Ta*w) 


Figure 5: KR reduction when zx is quasi-e-equivalent 


3.2 Confluence for K-rules 


In order to have an algorithm checking whether a K-graph is a Glushkov K-graph, we 
have to know (1) if it is decidable to apply a K-rule on some vertices and (2) if the 
application of K-rules ends. In order to ensure these characteristics, we will specify some 
sufficient properties on the semiring K. Let us define K as a field or as a factorial semiring. 
A factorial semiring K is a zero-divisor free semiring for which every non-zero, non-unit 
element x of IK can be written as a product of irreducible elements of K x = p,--- pn, and 
this representation is unique apart from the order of the irreducible elements. This notion 
is a slight adaptation of the factorial ring notion. 

It is clear that, if K is a field, the application of K-rules is decidable. Conditions of ap- 
plication of K-rules are sufficient to define an algorithm. In the case of a factorial semiring, 
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as the decomposition is unique, a gcd is defined] and it gives us a procedure allowing us 
to apply one rule (IKR» or KH3) on a K-graph if it is possible. It ensures the decidability 
of K-rules application for factorial semirings. For both cases (field and factorial semiring), 
we prove that K-rules are confluent. It ensures the ending of the algorithm allowing us to 
know whether a K-graph is a Glushkov one. 


We explicit algorithms in order to apply the KRz and KR} rules. Algorithm [2] tests 
whether the KRə-rule graphical and numerical conditions for two states are verified. If 
so, it returns the partially reduced K-graph. Algorithm [I] is divided into three func- 
tions. The first one check whether the KR3-graphical conditions are checked on a state 
x (KR3GRAPHICALEQUIVALENCECONDITIONSCHECKING) and returns the € or quasi-e- 
equivalence type of x. Then, depending on the type of x, the numerical conditions for e 
or quasi-e-equivalence are verified (function EQUIVALENCECHECKING). Finally a partially 
reduced K-graph is obtained using GRAPHCOMPUTING function. 


Algorithm 1 Application of the IKR3 rule for a state 


IKRa- APPLICATION(z, G) 
> Input: One state x of a K-graph G = (X,U) 
> Output: The newly computed graph G 
1 Begin 
2 if KR3GRAPHICALEQUIVALENCECONDITIONSCHECKING (x, G, type) = False then 
3 return False 
> If type is equal to e (resp. quasi-e) lines labeled (quasi-e | (resp. { € }) 
> of the functions below are deleted 
if EQUIVALENCECHECKING(z, G, [a], [B], k, |y]) = False then 
return False 
GRAPHCOMPUTING(z, G, [a], [6], k, [y]) 
return True 


End 


O5 -3 OY CO 4 


1n case K is not commutative, left gcd and right gcd are defined. 
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Algorithm 2 Application of the KR» rule for two states 


IKR2-APPLICATION(2, y, G) 
> Input: Two states x and y of a K-graph G = (X,U) 
> Output: The newly computed graph G 


1 Begin 

2 ifQ (x) FQ (y) or Q*(z) # Q*(y) then 
3 return False 

4 q +a vertex of Q7 (x) 

5  ged,(z) + U(q,, 2) 

6 — gced,.(y) - U(qy ,y) 

7 for each q` €Q (z) do 

8 gcd,(z) < RIGHT GCD(U (q7, x), ged,.(x)) 
9 gcd,.(y) RIGHT GCD(U (q7, y), gcd, (y)) 
10 + for each gq € Q(x) do 
11 compute a,- such that U(q^,z) = ag- ® gcd, (x) 
12 if a,- & ged,(y) AU(q , y) then 
13 return False 
14 qf — a vertex of Q* (z) 
15 ged,(x) + U(z, qi) 
16 — gedi(y) + U(y, qj) 
17 for each q* € Q*(r) do 
18 gcd;(z) + LEFT GCD(U (z, q^), gcd,(x)) 
19 ged,(y) — LEFT GCD(U (y, q*). ged,(y)) 
20 for each q* € Qt (x) do 
21 compute + such that U(z,q*) = gcd)(x) & B,« 
22 if gedi (y) & y+ 7 U (y, q^) then 
23 return False 
24 delete y and any edge connected to y 
25 for each q` € Q (x) do 
26 U(q- x) *- às- 
27 for each q* € Qt (x) do 
28 U(qt, x) — bgt 
29 return T'rue 

30 End 
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GRAPHCOMPUTING(z, G, [a], [6], k, |y]) 
> Input: One state z of a K-graph G = (X,U) 
> o € KI? Gl. g eKO k eK 
> Input: and y € KIB(? (»)IxIT(9* ()I > quasi-c 
> Output: The newly computed graph G 
1 Begin 
for each q7 € Q (x) do 
U(q7, x) + ag- 
for each gt € Q*(x) do 
U(x, aq) m Bat 


%© 


delete any edge (q~, qt) € Q^ (x) x QT (az) Pe 
F) € Q7 (x) x Q*(z) NB(Q-(z)) x T(Q*(z)) > quasi-e 
for each (q~,q*) € B(Q- (x)) x T(Q*(x)) do > quasi-c 
U(q-q e (q, qt) > quasi-c 


3 

4 

5 

6 

7 delete any edge (q^, q^ 
8 

9 
10 


KRa3GRAPHICALEQUIVALENCECONDITIONSCHECKING(z, G, type) 
> Input: One state z of a K-graph G = (X,U) 
> Output: type € {e-equivalence, quasi-e-equivalence] 
1 Begin 
2 compute B(Q- (x)) and T(Qt(z)) 
3 if B(Q-(z)) = Q- (z) and T(Q*(z)) = Q+ (x) then 
4 for each q7 € Q^ (x) do 
5 for each qt € Q+ (x) do 
6 if U(q-,q*) = 0 then 
7 return False 
8 


type & € 

9 return True 
10 else for each gq € Q (x) do 
11 for each q* € Q+ (x) do 
12 if (q7, q") € Q^ (z) x Q* (x) B(Q- (2) x T(Q* (2) 

and U(q^,q*)) = D then 

13 return False 
14 type + quasi-e 
15 return True 
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EQUIVALENCECHECKING (x, G, [a], [8], k, [y]) 

> Input: One state x of a K-graph G = (X,U) 
> Output: ac KI? |, Be KIQ* G)l k EK 
> Output: and y € KIB DIXIT (»)| > quasi-e 
1 Begin 
2 q, «a vertex of Q7 (x) 
3  ged, + U(q, , x) 
4 for each q7 € Q7 (x) do 
ó gcd, — RIGHT GCD(U (q^ ,x), gcd,.) 
6  foreachq € Q(x) do 
7 compute a,- such that U(q^,x) = o4- & ged 
8 qj <a vertex of Q* (a) 
9 ged, + U(a, qi) 

10  foreachq* € QT (x) do 


T 


11 gcd; — LEFT GCD(gcd,, U (x, q*)) 

12 for each qt € Q* (x) do 

13 compute 4+ such that U(a,qt) = ged; ® B, 

14 (q7 q] ) +} a couple of vertices of Q7 (x) x Q* (x) > € 

15 (qq 9) +} a couple of vertices of > quasi-e 


Q^ (x) x Q* (z) \ B(Q- (z)) x T(Q* (z)) 
16 Find kı such that 

Ul, qt) = ag 8 ki 98, 
17 if kı does not exist then 


18 return False 
19 for each (q ,q*) € Q (x) x Q*(x) do 
20 if (q7,q*) € B(Q-(x)) x T(Q*(x)) then > quasi-c 
21 Find k such that 
U(q^.q*) = Qg- G k Q Bat 
22 if k does not exist then 
28 return False 
24 elif k Z kı then 
25 return False 
26 for each (q ,q*) € B(Q- (x)) x T(Q*(x)) do > quasi-c 
27 Find y(q~,q*) such that > quasi-e 
U(q ,q*) = ag- Dk Q By EIT, q”) 
28 if y(q7,q™) does not exist then > quasi-e 
29 return False > quasi-ec 
30 return True 
31 End 


Definition 6 (confluence) Let G be a K-graph and Ig the acyclic graph having only one 
verter. Let Rı be a sequence of K-rules such that 


G I 
m C 
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K-rules are confluent if for all K-graph Go such that there exists Ro a sequence of 
K-rules with G = G^» then there exists R5 a sequence of K-rules such that 
2 


Go = Ig 
Ra 


For the following, K is a field or a factorial semiring. 


Proposition 7 The K-rules are confluent. 


Proof In order to prove this result, we will show that if there exist two applicable K-rules 
reducing a Glushkov K-graph, then the order of application does not modify the resulting 
K-graph. 

Let us denote by rz (G) the application of a KR;, KR» or KR rule on the vertices x 
and y with y = ( for a KR3 rule. 

Let G = (X,U) be a Glushkov K-graph and let rz and rz; be two applicable K-rules 
on G such that {x,y} N {z,t} = @ and no edge can be deleted by both rules. Necessarily 
we have rz J(rz1(G)) = rzi(rz,,(G)). 

Suppose now that {x,y} {z,t} Æ 0 or one edge is deleted by both rules. We have to 
consider several cases depending on the rule rz ,. 


Tz y is a KR, rule 


Tz y is a KR» rule 


T, is a KR3 rule 


In this case rz ; can not delete the edge from x to y and rz; is necessarily a KR,-rule 
with {x,y} N (z,t) z 0. If y = z, as the coefficient does not act on the reduction 
rule, four) = For 


Consider that rz; is a KR» rule with y = z. Using the notations of the KR» rule, there 
exist 0 Doe, Ios lo, fas Ty such that U(g a) = a5, Ue y) = y=, U(£,q*) = 
TeGq+ and U(y,qt) = ryB,- with q^ € Q^ (x), qt € Q*(x), and I, = ged, (a), 
ly = gcd,(y) (rz = gedj(z), ry = ged;(y)). By hypothesis, a KRə rule can also be 
applied on the vertices y and t. There also exists a’_, 8',, l, lj, r^, r; such that 
ag- =a! Up, Bye = rl, U(g7,t) = ot Li, U(t, g") = ri! (Q(z) = Q- (t) and 
Q* (x) = Q* (t)). By construction (Algorithm P) of gcd, (z), the left ged of all ag- is 
1. Then, whatever the order of application of KR» rules, the same decomposition of 
edges values is obtained. Symetrically a same reasoning is applied for the right part. 


Consider now that r;; = rz is a KR3 rule. Neither edges from x or y nor edges to 
x or y can be deleted by r,g. Then z = z or z = y. Let z = y. If we successively 
apply rz, and ryg or ryg and rz, on G, we obtain the same K-graph following the 
same method (function EQUIVALENCECHECKING) as the previous case. If we choose 
z = x, we have also the same K-graph (commutativity property of the sum operator). 


The only case to consider now is rz; = r, g a KR3 rule. Suppose that r, g deletes an 
edge also deleted by ryg (with x # z). Let (q^, q*) be this edge. 
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Using the notations of the KR} rule, there exist a,-, 6,+, l,r such that U (q7, x£) = 
og-l, U(z,q*) = rBq«, U(q^,q*) = oq- kBq« © with q7 € Q^ (x), q* € Q*(x) and 
l = gcd, (x), r = gcd;(x). There also exists , Bras l', r' such that U(q^,2) = E 
U(z.q*) = Bl, Ug" sa") = af kB, OV with Y = ged, (2), 7 = godi(2). By 
construction, (function EQUIVALENCECHECKING), the computation of l and I’ (r and 
r’) are independant. A same reasoning is applied for the right part. Then we can 
choose ^" such that y = ü = k' m Q y" and 4' = a,-k6,+ 6 Y". So U(q^,q*) = 
Qq- kB, o k'B,. O7”. It is easy to see that rs g(rzg(G)) = rz,0(T2,0(G)). 


3.3 K-reducibility 


Definition 8 A K-graph G = (X,U) is said to be K-reducible if it has no orbit and if it 
can be reduced to one vertex by iterated applications of any of the three rules KR,, KRo, 
KR}; described below. 


Proposition [10]shows the existence of a sequel of K-rules leading to the complete reduc- 
tion of Glushkov K-graphs. However, the existence of an algorithm allowing us to obtain 
this sequel of IK-rules depends on the semiring K. 

In order to show the K-reducibility property of a Glushkov K-graph G, we check 
(Lemma [9) that every sequence R of K-rules leading to the K-reduction of G contains 
necessarily two KR, rules which will be denoted by ro and re. 


Lemma 9 Let G = (X,U) be a K-reducible Glushkov K-graph without orbit with |X| > 3, 
and let R = ri: rq be the sequence of K-rules which can be applied on G and reduce it. 
Necessarily, R can be written R'ror, with ro and re two KR,-rules merging respectively s; 
and 9. 


Proof We show this lemma by induction on the number of vertices of the graph. It is 
obvious that if |X| = 3 then, the only possible graphs are the following ones: 


9 (D) 


and then, for the first one R = rore with k = A in ro and k = A in rẹ. For the second one 
x is e-equivalent and R = rrore with r a KRs-rule such that a —1, 8—1,1—-A,r— X 
and k = A". Then, ro and rẹ are KR, rules such that k = 1 for ro and r,. Suppose now 
that G has n vertices. As it is K-reducible, there exists a sequence of K-rules which leads 
to one of the two previous basic cases. 
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For the reduction process, we associate each vertex of G to a subexpression. We define 
E(x) to be the expression of the vertex x. At the beginning of the process, E(x) is a, the 
only letter labelling edges reaching the vertex x (homogeneity of Glushkov automata). For 
the vertices s; and ®, we define E(s;) = E(®) = e. When applying K-rules, we associate 
a new expression to each new vertex. With notations of figure P| the KR,-rule induces 
E(x) + E(x) k x E(y) with k = U(z, y). With notations of figure[3] the KRə-rule induces 
E(x) — l; E(x)r; +lyE(y)ry. And with notations of figures d] and [5] the KR3-rule induces 
E(x) — LE(x)r +k. 


Proposition 10 Let G — (X,U) be a K-graph without orbit. The graph G is a Glushkov 
K-graph if and only if it is K-reducible. 


( = ) This proposition will be proved by recurrence on the length of the expression. First 
for ||E|| = 1, we have only two proper K-expressions which are E = A and E = Xan’, for 
A, A € K. When E = A, the Glushkov K-graph has only two vertices which are s; and ® 
and the edge (sr, 4) is labeled with A. Then the KR, rule can be applied. Suppose now 
that E = AaX, then the Glushkov K-graph of E has three vertices and is K-reducible. 
Indeed, the IKR,-rule can be applied twice. 

Suppose now that for each proper IK-expression E of length n, its Glushkov K-graph is 
K-reducible. We then have to show that the Glushkov K-graph of K-expressions F = E-4-A, 
F = E--AaN, F = AaX- E and F = E- AaX of length n 4- 1 are K-reducible. Let us denote 
by R (respectively R’) the sequence of rules which can be applied on Ag(E) (respectively 
Ax(F)). In case |X| > 3, R = ror, (respectively R’ = Rrir.). 


case F = E + A We have Pos(F) = Pos(E), First(F) = First(E), Last(F) = Last(E), Null(F) = 
Null(E) + à and Vi € Pos(E), Follow(F,i) = Follow(E,i). Every rule which can 
be applied on Ag(E) and which does not modify the edge (sr, 9) can also be applied 
on Ax(F). 


If Ax(F) has only two states, then R = r a KR;-rule, and then R’ = r’ a KR- 
rule where r' is such that k = Null(E) + 4. Elsewhere, the (sz, 9) edge can only be 
reduced by a KRg rule. 


Suppose now that there is no KR rule modifying (sz, 9) which can be applied on 
Ax(F). Then there is a KR3 rule r’ which can be applied on Ag(F) with k = \ and 
then Ag(F) can be reduced by R’ = Rr'rir,. 


Let us now suppose that r1,1r2,---Tn is the subsequence of IKRa-rules of R which 
modify the (sr, ?) edge. Necessarily, rn acts on a state x which is e-equivalent. If 
Q^ (x) Z {sr} or Q(x) A (9) then Ri, = Rer,41 where rẹ in Rj, is modified as 
follows: x is quasi-e-equivalent with y = A and the rule rn+1 is a KRz rule on a state 
x which is e-equivalent and k = A. Elsewhere, there is two cases to distinguish. If 
Null(E) & \ = 0 then the r, rule is no more applicable on Ak(F) (no edge between 
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sy and ®) and the r,_; rule in R’ now acts on an e-equivalent vertex in Ak(F). If 
Null(E) + 40 then r, can be applied on Ag(F) with k =k & A. 


case F = E + AaX If |Pos(E)| = n, we have, Pos(F) = Pos(E) U (n + 1), First(F) = First(E) 9 
{(A,n+1)}, Last(F) = Last(E)9((M,n--1)), Null(F) = Null(E) and Vi € Pos(E), 
Follow(F,i) = Follow(E,i) and Follow(F,n + 1) = 0. In this case, R’ = Rerrir, 
where r is a KH rule with as, = Be = I and ly = A, ry = X and so Ax(F) is 
K-reducible. 


case F = E- dan’ If |Pos(E)| = n, we have, Pos(F) = Pos(E) U (n + 1}, First(F) = First(E), 
Last(F) = ((M,n4-1)), Null(F) = 0 and Vi € Pos(E)N P(Last(E)), Follow(F,i) = 
Follow(E,i) and Vi € P(Last(E)), Follow(F,i) = Follow(E,i) w {(A,n + 1)). Let 
T1,::: T4 be the subsequel of K-rules modifying edges reaching ®. Necessarily, n = 1 
and rı = re (Lemma). Indeed, let us suppose that n > 1 and that there exists j Æ i 
such that r; is a KR;, KR», or KR3-rule. Necessarily |Q~ (®)| > 1, which contradicts 
our hypothesis. Then we have R’ = Rr,44, where r, the KR;-rule from a vertex x 
to ® of the sequence R and labeled with k; is modified in R’ as follows: k = k; & A. 
We have also k = X for the rule r441. 


The case F = AaX- E is proved similarily as the previous one considering the rules modi- 
fying edges from s; (with ro instead of re). 


( < ) By induction on the number of states of the reducible K-graph G = (X,U). If 
|X| = 2, X = (sr, 9) and the only K-expression E is A with A € K. Let G' = (X',U?) 
be the Glushkov K-graph obtained from E. By construction À = U(s;,®) = E(sr) and 
A = Null(E), necessarily G' = G. 

We consider the property true for ranks bellow n 4-1 and G a K-graph partially reduced. 
Three cases can occur according to the graphic form of the partially reduced graph. Either 
we will have to apply twice the KR,-rule or once the KR3-rule and twice the KR,-rule 
if X = (sr, z, 9), or we will have to apply once the KR2-rule and twice the KR-rule if 
X = (sr, v, y, 9). For each case, we compute successively the new expressions of vertices, 
and we check that the Glushkov construction applied on the final K-expression is G. 


3.4 Several examples of use for K-rules 


For the KR» rule, the first example is for transducers in (K, 6, @) =(=* U Ø, U, -) where *" 
denotes the concatenation operator. In this case, we can express the IK» rule conditions as 
follows. For all q7 in Q^ (x), ag- is the common prefix of U(q^,x) and U(q^, y). Likewise, 
for all q* in Q(x), B, is the common suffix of U(q*,) and U(qt,y) . 
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b ba 

The second one is in (Z/7Z|i, j, k],®, &), where (i, j, k} are elements of the quaternions 

and Q is the sum and ® the product. In this case, K is a field. Every factorization leads 
to the result. 


We now give a complete example using the three rules on the (NU{+00}, min, +) semir- 
ing. This example enlightens the reader on the problem of the quasi-epsilon equivalence. 
For this example, we will identify the vertex with its label. 


3 0868103 
KR; rule can be applied on x with | = 2, r = 5 and k = 6 


KRz3 rule can be applied on y KR, rule can be applied on (275 + 6) and on (092 + 1) 
with l = 0, r = 2 and k = 1 
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(225 + 6)(Oy2 + 1) (2x5 + 6)(0y2 + 1) + 2z 


3 A KR; rule can be applied to end the process 
KR? rule can be applied on 
(2x5 + 6)(Oy2 + 1) and on z 


This example leads to a possible K-expression such as E = ((275 + 6)(0y2 + 1) + 22) +3 


4 Glushkov K-graph with orbits 


We will now consider a graph which has at least one maximal orbit O. We extend the 
notions of strong stability and strong transversality to the K-graphs obtained from K- 
expressions in SNF. We have to give a characterization on coefficients only. The stability 
and transversality notions are rather linked. Indeed, if we consider the states of In(Q) as 
those of OF then both notions amount to the transversality. Moreover, the extension of 
these notions to WFAs (K-stability - definition[12]- and K-transversality - definition [14], 
implies the manipulation of output and input vectors of O whose product is exactly the 
orbit matrix of O (Proposition [17). 


Lemma 11 Let E be a K-expression and Gx(E) its Glushkov K-graph. Let O = (Xo, Uo) 
be a maximal orbit of Gx(E). Then E contains a closure subexpression F such that Xo = 
Pos(F). 


This lemma is a direct consequence of Lemma 4.5 in [5] and of Lemma [4l 
Definition 12 (K-stability) A maximal orbit O of a K-graph G = (X,U) is K-stable if 
e Ó is stable and 


e the matriz Mo € KIOwUtO)IxUn(OO such that Mo(s,e) = U(s,e), for each (s,e) of 
Out(O) x In(O), can be written as a product VW of two vectors such that V € 
kiQut(O)x1 and W c KixHn(o)l, 


The graph G is K-stable if each of its maximal orbits is K-stable. 


If a maximal orbit O is K-stable, Mo is a matrix of rank 1 called the orbit matriz. 
Then, for a decomposition of Mo in the product VW of two vectors, V will be called the 
tail-orbit vector of O and W will be called the head-orbit vector of O. 
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Lemma 13 A Glushkov K-graph obtained from a K-expression E in SNF is K-stable. 


Proof Let G be the Glushkov K-graph of a K-expression E in SNF, Ax(E) = (£, Q, sr, F, ô) 
its Glushkov WFA and O = (Xo,Uo) be a maximal orbit of G. Following Lemma 
and Theorem [5|] G is strongly stable which implies that every orbit of G is stable. Let 
s; € Out(O), 1 € i € |Out(O)| and ej; € In(O), 1 € j € Out(O). Following the ex- 
tended Glushkov construction and as for all s; € Out(O), si 4 sr, we have 4(s;,a,e;) = 
Coeff rotiow(#,s;) (€j). As O corresponds to a closure subexpression F* or F* (Lemma DI) 
and as (si, a,e;) is an edge of Xo x X x Xo, we have 0(si, a, ej) = Coeff poow(F*,s;) (€j) = 
Coeff puis s ye Coeffz, ute (s1)-First(F) (63): As E is in SNF, so are F* and Ft, and then 
ô(si,a, ej) = Coeff Coeff, aster) (s:)-First(F) (C7) = Coeffr stp) (Si). Coeff rirst(r) (ej). The lemma 
is proved choosing V c K! Out(©)|x1 such that V(i, 1) = Coeffzast(r) (si) and W € Kixin(o)| 
with W(1,3) = Coeff rir st(r) (ej). 

" 


Definition 14 (K-transversality) A mazimal orbit O of G = (X,U) is K-transverse if 


e O is transverse, 


e the matriz Me € KlO-Ix nO) such that Me(p,e) = U(p,e) for each (p,e) of O^ x 
In(O), can be written as a product ZT of two vectors such that Z € KO |*! and 
Te KIHO), 


e the matriz M; € KlOut(O)xIO*| such that M;s(s,q) = U(s,q) for each (s,q) of 
Out(O) x OF, can be written as a product T'Z' of two vectors such that T' € 
KlOut(©)|x1 and Z! c K1xlO*1, 


The graph G is K-transverse if each of its maximal orbits is K-transverse. 


If a maximal orbit O is K-transverse, Me (respectively M,) is a matrix of rank 1 
called the input matriz of O (respectively output matrix of O). For a decomposition of Me 
(respectively Ms) in the product ZT (respectively T'Z') of two vectors, T will be called 
the input vector (respectively T" will be called the output vector) of O. 


Lemma 15 The Glushkov K-graph G = (X,U) of a K-expression E in SNF is K-transverse. 


Proof Let O be a maximal orbit of G. Following Lemma [4] and Theorem [5] G is strongly 
transverse implies that O is transverse. By Lemma there exists a maximal closure 
subexpression H such that H — F* or H — F*. As E is in SNF, so is H. By the 
definition of the function Follow, we have in this case: for all p € Out(O), for all q € O*, 
U (p, q) = Coeff roiiow(F,p)()- We now have to distinguish three cases. 
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1. If |O*| = 1, then the result holds immediatly. Indeed the output matrix of O is a 
vector. 


2. If OF = {q,--:,@n} and n > 1, V1 € j € n,q; # 9, necessarily, we have 


Ot = U P(First(H;)) with H, some subexpressions of E. Then we have U (p, q;) = 
l 

Coeff Coeff, sun (p).First( Hy) (4) if qj € P(First(H;)). Then as q; is a first position of 

only one subexpression, U (p, qj) = kp & Coeffp;,s(5,)(q;) where kp = Coeff zast(r) (p) 

which concludes this case. 


3. Now if J1 < j < n | qj = 9 then U(p,q;) = Coeffzast(r) (p) & k where k is the Null 
value of some subexpression following F not depending on p. 
A same reasoning can be used for the left part of the transversality. 
" 


Definition 16 (K-balanced) The orbit O of a graph G is K-balanced if G is K-stable 
and K-transverse and if there exists an input vector T of O and an output vector T’ of O 
such that the orbit matriz Mo = T'T. The graph G is K-balanced if every maximal orbit 
of G is IK-balanced. 


Proposition 17 A Glushkov K-graph obtained from a K-expression E in SNF is K- 
balanced. 


Proof Lemma[13|enlightens on the fact that V, the tail orbit vector of O, is such that 
V(i,1) = Coeffr;;(ry(i) for all i € P(Last(F)), which is, from Lemma [15] the output 
vector of O. The details of the proofs for these lemmas show in the same way that there 
exists an head-orbit vector and an input vector for Ó which are equal. 


We can now define the recursive version of WFA K-balanced property. 


Definition 18 A K-graph is strongly K-balanced if (1) it has no orbit or (2) it is K- 
balanced and if after deleting all edges Out(O) x In(O) of each maximal orbit O, it is 
strongly K-balanced. 


Proposition 19 A Glushkov K-graph obtained from a K-expression E in SNF is strongly 
K-balanced. 


Proof Let G be the Glushkov of a K-expression E and O be a maximal orbit of G. The 
Glushkov K-graph G is strongly stable and strongly transverse. As E is in SNF’, edges of 
Out(O) x In(O) that are deleted are backward edges of a unique closure subexpression F* 
or F+. Consequently, the recursive process of edges removal deduced from the definition 
of strong IK-stability produces only maximal orbits which are IK-balanced. The orbit Ó is 
therefore strongly K-balanced. 
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Theorem 20 Let G = (X,U). G is a Glushkov K-graph of a K-expression E in SNF if 
and only if 


e G is strongly IK-balanced. 


e The graph without orbit of G is K-reducible. 


Proof Let G = (X,U) be a Glushkov K-graph. From Proposition [19] G is strongly K- 
balanced. The graph without orbit of G is K-reducible (Proposition [10) For the converse 
part of the theorem, if G has no orbit and G is K-reducible, by Proposition [I0] the result 
holds immediatly. Let O be a maximal orbit of G. As it is strongly K-balanced, we can 
write Mo = VW the orbit matrix of O, there exists an output vector T" equal to the 
tail-orbit vector V and an input vector T equal to the head-orbit vector W. If the graph 
without orbit of O corresponds to a K-expression F then Ó corresponds to the K-expression 
F+ where Coeffp;, (pe (i) = W (1, i), Vi € P(First(F*)), Coeffrast(r+ (j) = VG, 1), Vj € 
P(Last(F*)). We have also Coeffrotow(r+ j(i) = Coeff p otiow(F,j)wCoef pasicr).-First(F) 0)» 
Vj € P(Last(F)) and Vi € P(First(F)). Hence the Glushkov functions are well defined. 

We now have to show that the graph without orbit of O can be reduced to a single 
vertex. By the successive applications of the IK-rules, the vertices of the graph without 
orbit of O can be reduced to a single state (giving a K-rational expression for ©). Indeed, 
as O is transverse, no K-rule concerning one vertex of O and one vertex out of O can be 
applied. 


5 Algorithm for orbit reduction 


In this section, we present a recursive algorithm that computes a K-expression from a 
Glushkov K-graph. We then give an example which illustrate this method. 
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Algorithms 


ORBITREDUCTION(G) 
> Input: A K-graph G = (X,U) 
> Output: A newly computed graph without orbit 
1 Begin 
for each maximal orbit O = (Xo,Uo) of G do 
if BACKEDGESREMOVAL(O, T, T", Z, Z’) then 
if ORBITREDUCTION(Q) then 
if ExPRESSION( Eo, O, T, T") then 
REPLACESTATES(G, O, Eo, Z, Z’) 
else return False 
else return False 
else return False 
10 return True 
11 End 


© O9 RWA AK w wo 


The BACKEDGESREMOVAL function on O deletes edges from Out(O) to In(O), returns 
true if vectors T, T", Z, Z' (as defined in definition [I4] can be computed, false otherwise. 

The EXPRESSION function returns true, computes the K-expression E of G' = (Xo U 
(5r, 9, U’) where U” + Uo U {(s7,T(1,j),e;) | ej € In(O)} U (si, T'(4,1),9) | s; € 
Out(O)} and ouputs Eo + E* if O is K-reducible. It returns false otherwise. 

The REPLACESTATES function replaces O by one state x labeled Eo and connected to 
O- and OF with the sets of coefficients of Z and Z’. Formally G = (X \ Xo U {z}, U) 
with U + U\{(u,k,v) | u,v € OJU((p, ZG, 1), x) | pj € O- JUt(z. Z (1,3), q)) | qi € OFF. 


27 


BACKEDGESREMOVAL(O, Me, Ms, T, T', Z, Z') 
> Input: a K-graph O = (Xo,Uo), Me € KIO Ixifn(o)l 
> Input: M, € kiOut(o)x|o*| 
> Output: T e KKO 7! e kIOuf(O)Ix1 7 ERO z e KO? 


1 Begin 
2 for each line l of Me do 
3 gcd,(1) —— LEFT GCD of all values of the line l 


> ged, is the vector of gcd;(I) values 
4 Find a vector gcd; such that Me = gcd, & gcd; 
5 if gcd, does not exist then 
6 return False 
7 for each column c of M, do 
8 gcd,.(c) < RIGHT GCD of all values of the column c 
> gcd, is the vector of gcd, (c) values 
9 Find a vector gcd, such that M; = gcd, & ged, 
10 if gcd, does not exist then 
11 return False 
12 Find k such that Mo = gcd, & k & gcd; 
> Mo € KiOut(O)|x|In(O)l is the orbit matrix of O 
13 if k does not exist then 
14 return False 
15 A «— RIGHT GCD of all values of the gcd; vector 
16 B «— LEFT GCD of all values of the gcd, vector 
17 ky + LEFT GCD(B, k) 
18 Find kə such that k = ky & kə 
19 if RIGHT GCD(k2, A) Z ke then 
20 return False 
21 T & ks Q ged, 
22 T + ged, & ki 
23 Find Z such that gcd; = Z 8 ko 
24 . Find Z' such that gcd, = kı & Z’ 
25 delete any edge from Out(O) to In(O) 
26 return T'rue 
27 End 
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Illustrated example 


We illustrate Glushkov WFAs characteristics developped in this paper with a reduction 
example in the (N U {+00}, min, +) semiring. This example deals with the reduction of 
an orbit and its connection to the outside. We first reduce the orbit to one state and 
replace the orbit by this state in the original graph. This new state is then linked to the 
predecessors (respectively successors) of the orbit with vector Z (respectively Z’) as label 
of edges. 

Let G be the K-subgraph of Figure [6] and let O be the only maximal orbit of G such 
that Xo m (ai, b2, C3, Q4, bs, be, c7}. 


Figure 6: An example for orbit reduction 


1 2 3 4 2 2 : 
We have M; — ( 345 ) Me = ( 533 ) We can check that Ó is IK-transverse. 


x, -($)G 2 3)-TZ and M, - (3) (2 0 0)=ZT. 


We then verify that the orbit O is K-stable. Mo = ( , ] : ) = ( i ) ( 2 0 0 ) = 


VW. We easily check that the orbit is K-balanced. There is an input vector T which is 
equal to W and an output vector T” which is equal to V. 


Then, we delete back edges and add s; and € vertices for the orbit ©. The s; vertex 
is connected to /n(O). Labels of edges are values of the T vector. Every vertex of Out(O) 
is connected to ®. Labels of edges are values of the T" vector. The following graph is then 
reduced to one state by iterated applications of K-rules. 
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The expression F associated to this graph is replaced by F'* and states of O^ (respec- 
tively O+) are connected to the newly computed state choosing Z as vector of coefficients 
(respectively Z’). 


os 


3 


((2a + b3 + c2): a- b- (4b + 5c2)) * 2 D 
@Q-— 


6 Conclusion 


While trying to characterize Glushkov K-graph, we have pointed out an error in the paper 
by Caron and Ziadi |5| that we have corrected. This patching allowed us to extend char- 
acterization to K-graph restricting K to factorial semirings or fields. For fields, conditions 
of applications of K-rules are sufficient to have an algorithm. 


For the case of strict semirings, this limitation allowed us to work with GCD and then 
to give algorithms of computation of K-expressions from Glushkov K-graphs. 

This characterization is divided into two main parts. The first one is the reduction of 
an acyclic Glushkov K-graph into one single vertex labeled with the whole K-expression. 
We can be sure that this algorithm ends without doing a depth first search according to 
confluence of K-rules. The second one is lying on orbit properties. These criterions allow 
us to give an algorithm computing a single vertex from each orbit. 

In case the expression is not in SNF or the semiring is not zero-divisor free, some edges 
are computed in several times (coefficients are ©-added) which implies that some edges 
may be deleted. Then this characterization does not hold. A question then arises: the 
factorial condition is a sufficient condition to have an algorithm. Is it also a necessary 
condition ? 
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